Identifying and Seeing beyond Multiple Sequence Alignment Errors Using Intra-Molecular Protein Covariation

نویسندگان

  • Russell J. Dickson
  • Lindi M. Wahl
  • Andrew D. Fernandes
  • Gregory B. Gloor
چکیده

BACKGROUND There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. METHODOLOGY/PRINCIPAL FINDINGS We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. CONCLUSIONS/SIGNIFICANCE Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Sequence Alignment Analysis by Local Covariation: Coevolution Statistics Detect Benchmark Alignment Errors

The use of sequence alignments to understand protein families is ubiquitous in molecular biology. High quality alignments are difficult to build and protein alignment remains one of the largest open problems in computational biology. Misalignments can lead to inferential errors about protein structure, folding, function, phylogeny, and residue importance. Identifying alignment errors is difficu...

متن کامل

I-COMS: Interprotein-COrrelated Mutations Server

Interprotein contact prediction using multiple sequence alignments (MSAs) is a useful approach to help detect protein-protein interfaces. Different computational methods have been developed in recent years as an approximation to solve this problem. However, as there are discrepancies in the results provided by them, there is still no consensus on which is the best performing methodology. To add...

متن کامل

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

Dengue virus type-3 envelope protein domain III; expression and immunogenicity

Objective(s): Production of a recombinant and immunogenic antigen using dengue virus type-3 envelope protein is a key point in dengue vaccine development and diagnostic researches. The goals of this study were providing a recombinant protein from dengue virus type-3 envelope protein and evaluation of its immunogenicity in mice. Materials and Methods: Multiple amino acid sequences of different i...

متن کامل

Designing Of Degenerate Primers-Based Polymerase Chain Reaction (PCR) For Amplification Of WD40 Repeat-Containing Proteins Using Local Allignment Search Method

Degenerate primers-based polymerase chain reaction (PCR) are commonly used for isolation of unidentified gene sequences in related organisms. For designing the degenerate primers, we propose the use of local alignment search method for searching the conserved regions long enough to design an acceptable primer pair. To test this method, a WD40 repeat-containing domain protein from Beauveria bass...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2010